[−][src]Crate multiversion
This crate provides the target
and multiversion
attributes for implementing
function multiversioning.
Many CPU architectures have a variety of instruction set extensions that provide additional functionality. Common examples are single instruction, multiple data (SIMD) extensions such as SSE and AVX on x86/x86-64 and NEON on ARM/AArch64. When available, these extended features can provide significant speed improvements to some functions. These optional features cannot be haphazardly compiled into programs–executing an unsupported instruction will result in a crash.
Function multiversioning is the practice of compiling multiple versions of a function with various features enabled and safely detecting which version to use at runtime.
Cargo features
There is one cargo feature, std
, enabled by default. When enabled, multiversion
will
use CPU feature detection at runtime to dispatch the appropriate function. Disabling this
feature will only allow compile-time function dispatch using #[cfg(target_feature)]
and can
be used in #[no_std]
crates.
Capabilities
The intention of this crate is to allow any function, other than trait methods, to be multiversioned. If any functions do not work please file an issue on GitHub.
The multiversion
macro produces additional functions adjacent to the tagged function which
do not correspond to a trait member. If you would like to multiversion a trait method, instead
try multiversioning a free function or struct method and calling it from the trait method.
Target specification strings
Targets for the target
and multiversion
attributes are specified as a combination of
architecture (as specified in the target_arch
attribute) and feature (as specified in the
target_feature
attribute). A single architecture can be specified as:
"arch"
"arch+feature"
"arch+feature1+feature2"
while multiple architectures can be specified as:
"[arch1|arch2]"
"[arch1|arch2]+feature"
"[arch1|arch2]+feature1+feature2"
The following are all valid target specification strings:
"x86"
(matches the"x86"
architecture)"x86_64+avx+avx2"
(matches the"x86_64"
architecture with the"avx"
and"avx2"
features)"[mips|mips64|powerpc|powerpc64]"
(matches any of the"mips"
,"mips64"
,"powerpc"
or"powerpc64"
architectures)"[arm|aarch64]+neon"
(matches either the"arm"
or"aarch64"
architectures with the"neon"
feature)
Example
The following example is a good candidate for optimization with SIMD. The function square
optionally uses the AVX instruction set extension on x86 or x86-64. The SSE instruction set
extension is part of x86-64, but is optional on x86 so the square function optionally detects
that as well. This is automatically implemented by the multiversion
attribute.
The following works by compiling multiple clones of the function with various features enabled and detecting which to use at runtime. If none of the targets match the current CPU (e.g. an older x86-64 CPU, or another architecture such as ARM), a clone without any features enabled is used.
use multiversion::multiversion; #[multiversion] #[clone(target = "[x86|x86_64]+avx")] #[clone(target = "x86+sse")] fn square(x: &mut [f32]) { for v in x { *v *= *v; } }
The following produces a nearly identical function, but instead of cloning the function, the implementations are manually specified. This is typically more useful when the implementations aren't identical, such as when using explicit SIMD instructions instead of relying on compiler optimizations.
use multiversion::{multiversion, target}; #[target("[x86|x86_64]+avx")] unsafe fn square_avx(x: &mut [f32]) { for v in x { *v *= *v; } } #[target("x86+sse")] unsafe fn square_sse(x: &mut [f32]) { for v in x { *v *= *v; } } #[multiversion] #[specialize(target = "[x86|x86_64]+avx", fn = "square_avx", unsafe = true)] #[specialize(target = "x86+sse", fn = "square_sse", unsafe = true)] fn square(x: &mut [f32]) { for v in x { *v *= *v; } }
Static dispatching
Sometimes it may be useful to call multiversioned functions from other multiversioned functions.
In these situations it would be inefficient to perform feature detection multiple times.
Additionally, the runtime detection prevents the function from being inlined. In this situation,
the dispatch
helper macro allows bypassing feature detection:
use multiversion::multiversion; #[multiversion] #[clone(target = "[x86|x86_64]+avx")] #[clone(target = "x86+sse")] fn square(x: &mut [f32]) { for v in x { *v *= *v } } #[multiversion] #[clone(target = "[x86|x86_64]+avx")] #[clone(target = "x86+sse")] fn square_plus_one(x: &mut [f32]) { dispatch!(square(x)); // this function call bypasses feature detection for v in x { *v += 1.0; } }
The dispatch
macro supports either paths or function calls:
dispatch!(foo)
dispatch!(Self::foo::<A, B>)
dispatch!(foo(a, b))
dispatch!(self.foo::<A, B>(a, b))
The statically dispatched function must be multiversioned over a subset of CPU features
supported by the caller function. For example, a function compiled for x86_64+avx+avx2
cannot statically dispatch a function compiled for x86_64+avx
, but a function compiled
for x86_64+avx
may statically dispatch a multiversioned function compiled for both
[x86|x86_64]+avx
and x86+sse
since an exact feature match exists for that architecture.
Conditional compilation
The #[cfg]
attribute allows conditional compilation based on the target architecture and
features, however this does not take into account additional features specified by
#[target_feature]
. In this scenario, the #[target_cfg]
helper attribute provides
conditional compilation in functions tagged with multiversion
or target
.
The #[target_cfg]
attribute supports all
, any
, and not
(just like #[cfg]
) and
supports the following keys:
target
: takes a target specification string as a value and is true if the target matches the function's target
#[multiversion::multiversion] #[clone(target = "[x86|x86_64]+avx")] #[clone(target = "[arm|aarch64]+neon")] fn print_arch() { #[target_cfg(target = "[x86|x86_64]+avx")] println!("avx"); #[target_cfg(target = "[arm|aarch64]+neon")] println!("neon"); #[target_cfg(not(any(target = "[x86|x86_64]+avx", target = "[arm|aarch64]+neon")))] println!("generic"); }
Macros
are_cpu_features_detected | Detects CPU features. |
Attribute Macros
multiversion | Provides function multiversioning. |
target | Provides a less verbose equivalent to the |